Characterizing a Brain-Based Value-Function Approximator
نویسندگان
چکیده
The field of Reinforcement Learning (RL) in machine learning relates significantly to the domains of classical and instrumental conditioning in psychology, which give an understanding of biology’s approach to RL. In recent years, there has been a thrust to correlate some machine learning RL algorithms with brain structure and function, a benefit to both fields. Our focus has been on one such structure, the striatum, from which we have built a general model. In machine learning terms, this model is equivalent to a value-function approximator (VFA) that learns according to Temporal Difference error. In keeping with a biological approach to RL, the present work1 seeks to evaluate the robustness of this striatum-based VFA using biological criteria. We selected five classical conditioning tests to expose the learning accuracy and efficiency of the VFA for simple state-value associations. Manually setting the VFA’s many parameters to reasonable values, we characterize it by varying each parameter independently and repeatedly running the tests. The results show that this VFA is both capable of performing the selected tests and is quite robust to changes in parameters. Test results also reveal aspects of how this VFA encodes reward value.
منابع مشابه
Tight Performance Bounds on Greedy Policies Based on Imperfect Value Functions
Consider a given value function on states of a Markov decision problem, as might result from applying a reinforcement learning algorithm. Unless this value function equals the corresponding optimal value function, at some states there will be a discrepancy, which is natural to call the Bellman residual, between what the value function speciies at that state and what is obtained by a one-step lo...
متن کاملReinforcement learning on an omnidirectional mobile robot
With this paper we describe a well suited, scalable problem for reinforcement learning approaches in the field of mobile robots. We show a suitable representation of the problem for a reinforcement approach and present our results with a model based standard algorithm. Two different approximators for the value function are used, a grid based approximator and a neural network based approximator.
متن کاملUniversal Approximator Property of the Space of Hyperbolic Tangent Functions
In this paper, first the space of hyperbolic tangent functions is introduced and then the universal approximator property of this space is proved. In fact, by using this space, any nonlinear continuous function can be uniformly approximated with any degree of accuracy. Also, as an application, this space of functions is utilized to design feedback control for a nonlinear dynamical system.
متن کامل16-899C ACRL Tetris Reinforcement Learner
Our approach to this problem was to use reinforcement learning with a function approximator to approximate the state value function [RSS98]. In our case, a +1 reward was given for every completed line, so that the value function would encode the long-term number of lines that is going to be completed by the algorithm. In order to achieve this, we extract features from the game state, and use gr...
متن کاملTile Coding Based on Hyperplane Tiles
In large and continuous state-action spaces reinforcement learning heavily relies on function approximation techniques. Tile coding is a well-known function approximator that has been successfully applied to many reinforcement learning tasks. In this paper we introduce the hyperplane tile coding, in which the usual tiles are replaced by parameterized hyperplanes that approximate the action-valu...
متن کامل